
(12) INTERNATIONAL APPORTION PUBLISHED UNDER THE PATENT COOWRATION TREATY (PCT) 



(19) World Intellectual Property Organization 

International Bureau 

(43) International Publication Date 
9 October 2003 (09.10.2003) 




PCT 




lllllllHi™ 



(10) International Publication Number 

WO 03/083720 A2 



(51) International Patent Classification 7 : G06F 17/30 

(21) International Application Number: PCT/GB03/01434 

(22) International Filing Date: 2 April 2003 (02.04.2003) 

(25) Filing Language: English 

(26) Publication Language: English 



(30) Priority Data: 

0207749.3 



3 April 2002 (03.04.2002) GB 



(71) Applicant (for all designated States except US): BIOW- 
ISDOM LIMITED [GB/GB] ; Merlin Place, Milton Road, 
Cambridge CB4 0DP (GB). 

(72) Inventors; and 

(75) Inventors/Applicants (for US only): BAXTER, Gordon, 
Smith [GB/GB]; 8 Pryors Orchard, Melbourn, Cambridge 
SG8 6UT (GB). TILFORD, Nick [GB/GB]; Alexandra, 
Wicken Road, Clavering, Essex CB11 4QT (GB). 



(74) Agent: GILL JENNINGS & EVERY; Broadgate House, 
7 Eldon Street, London EC2M 7LH (GB). 

(81) Designated States (national): AE, AG, AL, AM, AT, AU, 
AZ, BA, BB, BG, BR, BY, BZ, CA, CH, CN, CO, CR, CU, 
CZ, DE, DK, DM, DZ, EC, EE, ES, H, GB, GD, GE, GH, 
GM, HR, HU, ID, IL, IN, IS, JP, KE, KG, KP, KR, KZ, LC, 
LK, LR, LS, LT, LU, LV, MA, MD, MG, MK, MN, MW, 
MX, MZ, NI, NO, NZ, OM, PH, PL, PT, RO, RU, SC, SD, 
SE, SG, SK, SL, TJ, TM, TN, TR, TT, TZ, UA, UG, US, 
UZ, VC, VN, YU, ZA, ZM, ZW. 

(84) Designated States (regional): ARIPO patent (GH, GM, 
KE, LS, MW, MZ, SD, SL, SZ, TZ, UG, ZM, ZW), 
Eurasian patent (AM, AZ, BY, KG, KZ, MD, RU, TJ, TM), 
European patent (AT, BE, BG, CH, CY, CZ, DE, DK, EE, 
ES, FT, FR, GB, GR, HU, IE, IT, LU, MC, NL, PT, RO, 
SE, SI, SK, TR), OAPI patent (BF, BJ, CF, CG, CI, CM, 
GA, GN, GQ, GW, ML, MR, NE, SN, TD, TG). 

[Continued on next page! 



Hp (54) Title: DATABASE SEARCHING METHOD AND SYSTEM 



< 



oo 
O 




(57) Abstract: A method and system is described for st&rcii::' j i 
plurality of information databases (2,3,4) for records related to 
input search term. The method comprises selecting a group of re- 
lated search terms containing the input search term from a search 
database (7) of terms arranged in predefined groups according to 
their relationship with one another. Each term is present within one 
or more of the information databases (2,3,4). A data repository (5) 
is searched for terms from the selected group, the data repository 
comprising selected data previously extracted from the records of 
each information database (2,3,4). The search identifies the corre- 
sponding records within the information databases which contain 
the terms within the selected group. 
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DATABASE SEARCHING METHOD A ND SYSTEM 

The present invention relates to a method and system 
for searching a plurality of information databases. 

Databases are well known and widely used for the 
organized storage of information. Depending upon the 
application in question, in many cases there is a great 
demand for the provision of searching methods to enable the 
stored information to be selectively accessed by a user. 
For this reason, a great deal of investment is often made 
in the production, updating and on-going development of 
such databases. The provision of improved searching 
methods forms part of this development. 

In fields of particular scientific or commercial 
15 interest there often exist a number of databases providing 
related and/ or overlapping information. These databases 
might result directly from different competing database 
suppliers or for exrjr?Z;' , - ^ *jo, the independent generation 
and- cataloguing of . & '. -'information . 
20 One particular *.:«?.?pl.e of the use of numerous 

databases is in the field of biomedical science. The 
biomedical domain is a multi-disciplinary domain 
encompassing all areas of biology and medicine. There is 
a large and ever increasing volume of electronic biomedical 
25 information present upon a number of databases, which are 
individually dedicated to particular fields within the 
biomedical discipline. 

Access to such information in cases such as these is 
unfortunately frustrated by the large number of disparate 
30 data sources and the lack of a standard nomenclature being 
used between them. 

Although a multitude of nomenclature or classification 
systems exist, there is a lack of consistency relating to 
their architecture and content . This hinders the ease with 
35 which the databases can be accessed. The content can also 
be variable between such databases as expertly annotated 
versions tend to have narrow discipline-related 
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perspectives, do not cover historical terms and indeed are 
not contemporaneous. 

As a result, database users tend to focus their 
investigations upon single databases with which they are 
familiar. This has associated disadvantages in that 
information which is highly relevant to the User may be 
present upon one or more databases covering overlapping or 
related fields but this information will not become known 
to the user. 

One of the main problems in such interrelated 
disciplines is that particular terms used in one discipline 
may not be identical to those used in a different 
discipline (a lack of semantic normalisation) and therefore 
automatic computer-based searching is severely limited. 
Furthermore, the arrangement of the information within such 
databases is generally unique to the database in question. 
The performance of a search upon multiple databases of this 
k:';-:- -'ore often requires labourious searching on 

sgr"S:v;-t£tiividual databases with a detailed knowledge of 
each -(..Liject being needed in order to perform a high 
quality search. 

There is therefore a need to provide an improved 
searching method to enable searching across multiple 
databases . 

In accordance with a first aspect of the present 
invention we provide a method of searching a plurality of 
information databases for records related to an input 
search term, comprising: - 

selecting a group of related search terms containing 
0 the input search term, from a search database of terms 
arranged in predefined groups according to their 
relationship with one another, wherein each term is present 
within one or more of the information databases; and, 

searching for terms from the selected group within a 
5 data repository comprising selected data previously 
extracted from the records of each information database, to 
identify the corresponding records within the information 
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databases which contain the terms within the selected 
group . 

The present invention overcomes many of the problems 
associated with searching a plurality of information 
5 databases, in that groups of related search terms are used 
to search upon the various databases provided. The 
semantic integration of information within multiple 
databases is very important to this process and the use of 
an ontology (or similar knowledge base) can provide the 
10 framework for this normalisation. 

The terms are preferably made available through an 
ontology, knowledge base or thesaurus. These groups are 
predefined and, when an inputted search term is provided by 
a user, the search database is queried in order to select 
15 the one or more groups containing this inputted search 
term. In particular, this allows dissimilar terms having 
identical or similar meanings, to be searched upon the 
plurality of information databases. This greatly improves 
the power of the searching technique (for example, the 
20 precision and recall of a query) and directly allows 
extension of searching beyond a single database to multiple 
databases. The speed of multiple database searching is 
therefore improved as a result. 

The method particularly benefits normal users who are 
25 familiar with only a single discipline, in that the 
provision of searching across multiple disciplines is 
provided without a detailed knowledge of these other 
disciplines being required. 

The present invention is not limited to any particular 
30 types of information databases nor to the subject matter of 
their contents. However, the invention is particularly 
advantageous for use in cases where a number of large and 
complex information databases are provided, each providing 
related or overlapping information. This is notably the 
35 case in the biomedical field. 

The present invention also recognises the problem 
that, for many databases, searching for information within 
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more than one database may increase the amount of processor 
time required for searching. This is addressed by 
previously extracting selected data from the various 
information databases and storing it in a dedicated data 
5 repository. Only selected data is normally needed for 
search purposes, because with most types of search it is 
not necessary to search through all data contained within 
each record of the information databases. One example of 
this is in the searching of a biotechnology database in 
10 which lengthy gene sequences are provided but the searching 
of these actual sequences is not required. The presence of 
such sequences represents a large amount of redundant data 
insofar as a search is concerned which is related to the 
causes of disease. 
15 it is therefore advantageous to extract data from the 

records of such information databases and to store the data 
separately in a data repository such that tfe ^eed and. 
efficiency with which the data may be »^ ? Tan be v>r< 

improved. 

20 The data repository is preferably arr^^H-;; 5 number v,fe 

of records, with a repository record corresponding to a 
record present within one of the information databases. 
There is therefore preferably a direct correspondence 
between the number of individual records in the information 

25 databases and the number of individual records in the 
repository. Each record in the repository preferably 
further comprises a pointer identifying the specific record 
in the information database to which it relates. This is 
used to allow access by a user to the full record when 

30 required. 

In the case of a direct correspondence of records 
between the repository and databases, this access may be 
achieved by simply using identical record identifiers (such 
as gene accession numbers) . However in cases of non- direct 

35 correspondence, a specific and separate pointer to the 
particular record is used. 
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Due to the extraction of the data from the information 
databases, typically the amount of selected data in the 
repository is less than that contained in the information 
databases. The degree to which the former amount is 
5 smaller is dependent upon the particular type of record 
used and the fields which are desired to be searched within 
each record. 

In general, the data in the repository comprises 
definitional and/or semantic data. The definitional data 

10 preferably describes data in terms of its nature, use or 
value whereas the semantic data preferably describes 
alternative terms for the data in the information 
databases. Generally, the semantic data describes 
synonymous terms in the information databases. 

15 Within the search database, each term preferably has 

corresponding meta-data indicating the one or more 
information databases withir •vfhich the particular term is , 
contained. This informs; o*r. r^Msed to reduce needless 
searching upon databat' -> - w <v -j? is known that no such 

20 term is present. This thare.^is increases the search speed 
during use . Such meta-data also preferably indicates the 
one or more fields of the information database (s) within 
which it is contained as it will be recognised that each 
information database generally has a unique format. 

25 Preferably the terms in the predefined groups are 

arranged within the search database such that the 
predefined groups are formed from synonymous terms. Each 
group is also typically provided with a unique group 
identifier. 

30 Due to the possibility that an inputted search term 

may be found within more than one group, the method 
preferably further comprises determining the context of the 
records retrieved using the inputted search term (and 
associated group of terms) . Following identifying the 
. 35 groups in which the term is present, when the repository is 
searched the context of each record may be determined 
during the search itself (to limit the number of records 
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returned) or later following the selection of all records 
containing any terms in the group. 

The context may be determined based upon the field 
type of the repository record in which the term is found 
5 such as a "domain" . Alternatively, or additionally, the 
context may be determined by searching for. the presence of 
one or more of the other terms within the group, in the 
same field or record of the repository. This allows 
automatic selection of the correct search subject. 
10 In general, the method according to the first aspect 

of the invention is performed by a computer program 
comprising suitable computer program code means. Such a 
computer program may be retained upon a computer readable 
medium. 

15 m accordance with the second aspect of the present 

invention, we provide a database searching system for 
searching a Plurality of information databases for records 
related f- ^ gutted search term, the system comprising:- 
o -..t/; database comprising related search terms 

2 0 arranged \ItAio predefined groups according to their 
relationship to one another, wherein each term is present 
within one or more of the information databases; 

selection means, for selecting a group containing the 
inputted search term from the search database; 

25 a data repository comprising selected data previously 

extracted from the records of each information database; 
and, 

searching means for searching the repository for terms 
from the selected group to identify the corresponding 
30 records within the information databases which contain the 
terms within the selected group. 

Typically therefore the search database and the 
searching system itself is based on an ontology. 

Preferably the search term is provided to the system 
35 using an input means which may take the form of a local 
input device, or alternatively a communication network such 
as the Internet. The use of a communication network allows 
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users to access the system from remote locations. The 
system may also comprise the information databases 
themselves, although typically these are also located 
remotely from the data repository. The selection and 
5 searching means are typically provided as a combined query 
system upon a computer. This computer may also contain 
either or both of the data repository and the search 
database . 

An example of a multiple database search method and 
10 system according to the present invention will now be 
described, with reference to the accompanying drawings, in 
which: - 

Figure 1 is a schematic representation of the search 
system; and 

15 Figure. 2 is a flow diagram of a method. of searching 

using the search system. 

A multiple database system relating to the field of 
biomedical science is generally indicated at 1 in Figure 1 , 
A number of individual proprietary information 

2 0 databases are indicated at 2, 3 and 4. Examples of these 
databases include "Genbank" (National Centre For 
Biotechnology Information) , "Swissprot" (European 
Bioinformatics Institute), "OMIM" (National Centre For 
Biotechnology Information) and "UMLS" (National Library Of 

25 Medicine) . In this example, three information databases 
are provided relating to gene sequences and genetic 
disorders . 

A data repository 5 is arranged in communication with 
each of the information databases 2, 3, 4. The data 
30 repository 5 is organised as a database, stored on a local 
computer server. The information databases 2, 3, 4 are 
stored upon remote servers and accessed by the data 
repository 5 using a suitable network such as the Internet. 
A query system 6 is arranged to access the data 
. 35 repository 5 and is implemented by suitable software 
running upon a local computer (which may be the server upon 
which the data repository 5 is stored) . 
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A separate search database 7 (knowledge base or 
ontology) is also provided on the query system computer and 
this is arranged to be accessed by the query system 6. An 
input means 8 is provided to allow a user of the system to 
5 access the query system 6. In the present example, the 
input means 8 is a remote computer connected via a 
communication network such as the Internet, to the query 
system 6 . Alternatively, it could be a local input device 
such as a keyboard attached to the query system computer. 

10 Regarding the information databases 2, 3, 4, these are 

generally arranged as a large number of records, with each 
record corresponding to a particular entity. In the case 
of the Genbank database, the records are arranged according 
to individual gene sequences. Each record contains a large 

15 number of fields. Examples of these for the Genbank 
information database include: LOCUS, DEFINITION, ACCESSION, 
VERSION, KEYWORDS, SEGMENT, SOURCE, ORGANISM, R^F^HENCE-, 
AUTHORS, TITLE, JOURNAL. A large amount <••'.•»..«. ±8* 

therefore provided in each record and not al\ *-U« ts 

20 useful for searches of the type provided by th;;. ;..^an of« 
this example. 

The data repository 5 provides a copy of each record 
within each of the information databases 2, 3, 4 and 
therefore mirrors the content of these databases. However, 

25 for each record, only data within selected fields is 
retained within the data repository 5 and therefore records 
within the data repository contain substantially less data 
than that provided within the full record upon the 
respective information databases. As to which fields are 

30 copied into the data repository 5, this is determined by 
the administrator of the system 1 and is dependent upon the 
type of searching services which are to be provided to a 
user. 

Table 1 shows part of a record within the. data 
35 repository 5 relating to the Genbank record for the HTR2B 
gene (AF156159) . 
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TABLE 1 





txtracxea i errn 


Genbank Field 


Meta-Data Type 


Meta-Data 
Field 




HSHTR2B2 


LUUUo 


^ekfim+innal/c^mpntif'* 

aeTiniiionai/oci i lanuu 


SYNONYM 


5 


DNA 


LUUUo 


definitional 


DOMAIN 




21-APR-2000 


LOCUS 


definitional 


ENTRY DATE 




HTR2B 


DEFINITION 


semantic 


SYNONYM 




Homo sapiens 5- 


DEFINITION 


definitional 


f-r-i i i*t*i /~vK 1 

DEFINTION 




hydroxytryptamine 2B receptor 








10 


(HTR2B) gene, exon 2. 










AF156159 


ACCESSION 


definitional/semantic 


SYNONYM 




Homo sapiens 


ORGANISM 


definitional 


SPECIES 




HTR2B 


FEATURES/ 
mRNA / gene 


1 semantic 


SYNONYM 




5-hydroxytryptamine 2B 


FEATURES / 


semantic 


SYNONYM 


15 


receptor 


mRNA / product 








HTR2B 


FEATURED, 
gene / gene 




SYNONYM 




HTR2B 


FEATURES / 
CDS / gene 


semantic 


SYNONYM 




5-hydroxytryptamine 2B 


FEATURES / 


semantic 


SYNONYM 




receptor 


CDS / product 







20 



In addition to the "Extracted term" data and the 
"Genbank field" data, extracted from Genbank and retained 
in the respective columns, the "Meta-Data Type" and "Meta- 

25 Data Field" columns of Table 1 provide additional 
information defining the type of data which is contained in 
the respective field. This is described as "meta-data" 
because data in these fields describe the data obtained 
from the information databases 2,3,4. Two types of meta- 

30 data are used in this example system, these being 
"definitional" and "semantic". 

Definitional meta-data is information that is used to 
uniquely describe and/or categorise data in terms of its 
nature, use, value and encumbrances. Semantic meta-data 
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provides alternative terms for data such as synonyms or 
cross-references. Semantic meta-data is used to infer 
equality in meaning between data from the information 
databases 2,3,4. These two types of meta-data are not 
5 exclusive and therefore meta-data can be both descriptive 
and semantic. For example a gene name for a data record 
may be both definitional and semantic meta-data. 

The "Meta-data type" column shows the kind of meta- 
data to which each extracted field relates and the "Meta- 
10 data Field" column defines a corresponding meta-data field 
for searching purposes. It can be seen in this latter case 
that a number of the fields from the information databases 
are assigned to the same meta-data field, namely "SYNONYM". 
In this particular record, the term "DNA" from this 
15 record is assigned to the "DOMAIN" meta-data field. The 
use of domains is described in more detail later. 

Each rfes?«jsd- within the repository 5 also has. 
associate . v „ .-.ta in the form of a "pointer" which j 
identif: '^^Sase and record from which the data was 

20 obtained. Is.*- this case, the Genbank field "ACCESSION" is 
used to identify the record and separate data (not shown in 
the Table 1) identifies the Genbank database. 

Turning now to the search database 7, this is also 
arranged as a number of records, each record defining a 
25 group of synonymous terms. These terms are obtained from 
the information databases 2,3,4 and may relate to not only 
some synonymous terms within the same database but also 
synonymous terms between different information databases. 
Each record in search database 7, may also define broader 
30 and/or narrower related terms. Table 2 is an example of 
extracted synonyms from the Genbank record shown in Table 
1 . 



35 
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TABLE 2 



Identifier 


Synonym 


Preferred Term 


012345678 


HSHTR2B2 


HTR2B 


012345678 


HTR2B 


HTR2B 


012345678 


AF156159 


HTR2B 


012345678 


5-hydroxytryptamine 2B 
receptor 


HTR2B 



Each synonym is assigned to a particular group 
10 identified with a corresponding group identifier which is 
internal to the system. Additionally, each group of 
synonyms has a "preferred" term which typically is the most 
commonly used or most convenient term for explanatory 
purposes. However, whether the actual preferred term is 
15 used as the inputted search term, does not affect the 
search scope. 

Table 3 shows part of a typical record upon the search 
int abase 7, containing synonyms extracted from the three 
information databases 2, 3, 4, for example Genbank, 
20 Swissprot and OMIM. Any degeneracy between the terms 
extracted from these information databases is removed. 

TABLE 3 



25 



30 



Identifier 


Synonym 


Preferred Term 


012345678 


HSHTR2B2 


HTR2B 




HTR2B 






AF156159 






5-hydroxytryptamine 2B 
receptor 






5-HT2B 






5HT2B 






Serotonin 2B receptor 





Referring back to Table 1, it can be seen that each of 
the extracted terms which were assigned to the "SYNONYM" 
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meta-data field, are also found within the same record in 
Table 3 (as the first four entries in the "Synonym" 
column) . The use of the meta-data field increases the 
searching speed when a search for synonymous terms is being 
5 performed within the records of the data repository 5, as 
searching in other fields is not needed. It should be 
remembered that the data repository 5 contains records from 
a number of different information databases 2,3,4 and 
therefore assigning meta-data fields produces this speed 

10 increase. 

Further information is also present within the records 
of the search database 7, for example, in the case of each 
synonym, an identifier is provided to identify the 
database (s) and in some cases the field (s) in which the 
15 term is present. Each of the search database records also 
contains a brief textual description of the subject to 
which the synonyms relate, such as "Gene that encode- -^hc^. 
5-hydroxpytryptamine 2B receptor" . 

Figure 2 shows a flow diagram of a suitable -\xrr -*«:•• 3 
20 use in the database searching system 1. At ste;;- . ..... 

Figure 2, a user of the system inputs a search term using 
the input means 8. At step 101, other information is also 
provided, for example in that the user selects a number of 
information databases upon which to search for the search 
25 term and possibly, a limitation to one or more field types 
in which to search for this term. 

In the present example, each of the databases 2,3,4 is 
selected and the user chooses all field types for 
searching. At step 102, the query system 6 analyses the 
3 0 input search term and then searches upon the search 
database 7 for any records containing the input search 
terms. This returns one or more "hits", that is records 
containing the search term as one of the synonymous terms. 
These records are then retrieved at step 103 and presented 

35 to the user. 

In some cases, the search term will be present in more 
than one of the records upon the search database 7 . In this 
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case, the user can view the textual description attached to 
the record in order to select the type of information 
required. 

Having reviewed the record description, at step 104, 
5 the user selects the particular record to which the 
intended search relates. At step 105, the synonymous terms 
held in the selected record of the search database 7 are 
then searched in the required fields of the records held in 
the data repository 5. Only those fields corresponding to 
10 the particular information databases selected by the user 
are searched and the results are then returned to the user 
at step 106. 

At step 107 a context filtering step is performed 
which analyses the records in order to discard or 

15 categorise records which are unlikely to be related to the 
desired search. For example, in a case where more than one 
search database record is initial!*:- returned, there will 
exist at least one synonym (th- • - " -rm) which is used, 
upon the information database v\ *w> ^liferent contexts. 

20 It is desirable to prevent the dd^v^ of records which do, 
not relate to the context of interest. This is achieved by 
context filtering. 

The method chosen for this filtering depends upon the 
way in which the information databases are structured. In 

25 the case of more unstructured databases, for example 
databases of the full text of scientific publications, an 
appropriate filtering technique is to search for other 
words relating to the context of interest within the 
records (such as searching for the other synonyms) . If 

30 none are found then the record in question can be assigned 
a low likelihood of relevance. If desired, this can be 
expressed mathematically for filtering and/or presented to 
the user. 

For example, if a query has been performed on a term 
. 35 »C" and all its synonyms. The search database states that 
C is a sub-class of B and B is a sub-class of A. Also D 
and E are sub-classes of C. A series of queries are 
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performed against the results set for C using synonyms of 
A, B, D and E sequentially. From the results of these 
queries, the records in the results set for term C can be 
scored for the co-occurrence of related- terms (A, B, D and 
5 E) . These scores can determine how the results are 
presented to the end-user. This method can be extended to 
score for the proximity of the related term to the original 
search term. 

For more structured information databases such as the 
10 biomedical science databases used in the present example, 
context filtering can be performed using the "domain" field 
as mentioned earlier. Upon construction of the data 
repository 5, the records are assigned to specific 
"domains" which represent broad topic classes such as DNA, 
15 disease, and so on. In this case, synonyms in a single 
search database record relate to information database 
records within ?, single domain. The search for records,,, 
within the r<- ; , .1- ,. y 5' can therefore be limited to records 
having the »* ^^tion to the synonyms within the group, 

2 0 of interest. .c.: example, if a database has, fields, 
relating to species and disease then a single record can be 
mapped, to the search database, by searching each field 
using synonyms from species and disease fields 
independently. A combination of these and other techniques 
25 can therefore be performed to effect context filtering. 
This filtering may be performed following retrieval of all 
of the records as in the present case, or it may be 
performed "on-the-f ly" . 

The retrieved and context filtered records from the 
30 data repository 5 are presented to the user at step 108. 
On selection of a particular record of interest by the 
user, at step 109 the pointer within the particular 
repository record of interest is accessed to discover the 
identity of the corresponding record upon one of the 
35 information databases 2,3,4. This full record is then 
retrieved from the specific information database and 
displayed to the user at step 110. 
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The above method can therefore advantageously be used 
to search for related information in databases which use 
different but synonymous terms to describe similar 
information. The selection of the extent to which terms 
5 are synonymous is at the discretion of the system 
administrator. Broader searches can be performed by using 
related rather than synonymous terms. 

Although the amount of information searched is 
potentially in excess of that searched using a single 
10 database, the speed and efficiency of the searching is 
significantly increased by the use of the data repository 
in which selected record extracts are used for searching 
purposes . 

In the present system, the user is not limited to 
15 searching using the technique described above as the method 
can be integrated with other conventional database 
searching tools which access the repository or the 
•;;rmation databases directly. 
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CLAIMS- 



1. A method of searching a plurality of information 
databases for records related to an input search term, 

5 comprising :- 

selecting a group of related search terms containing 
the input search term, from a search database of terms 
arranged in predefined groups according to their 
relationship with one another, wherein each term is present 
10 within one or more of the information databases; and, 

searching for terms from the selected group within a 
data repository comprising selected data previously 
extracted from the records of each information database, to 
identify the corresponding records within the information 
15 databases which contain the terms within the selected 
group. 

2. A method according to claim 1, wherein the data,.,, 
repository is arranged as a number of records, each J5fW#f 
corresponding to a record present within one cc £h/3 
20 information databases. 

3 . A method according to claim 2 , wherein each record la 
the repository comprises a pointer identifying the record 
in the information database to which it relates. 

4. A method according to any of the preceding claims, 
25 wherein the amount of selected data in the repository is 

less than that contained in the information databases. 

5. A method according to any of the preceding claims, 
wherein the data in the repository comprises definitional 
data. 

30 6. A method according to claim 5, wherein the definitional 
data describe data in terms of its nature, use or value. 

7. A method according to any of the preceding claims, 
wherein the data in the repository comprises semantic data. 

8. A method according to claim 7, wherein the semantic 
35 data describes alternative terms for the data in the 

information database. 
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9. A method according to claim 8, wherein the semantic 
data describe synonymous terms in the information 
databases . 

10. A method according to any of the preceding claims, 
5 wherein each term in each predefined group within the 

search database has associated meta-data indicating the one 
or more information databases within which - the term is 
contained. 

11. A method according to claim 10, wherein the 
10 corresponding meta-data indicates the one or more fields of 

the information database (s) within which it is contained. 

12. A method according to any of the preceding claims 
wherein a number of records within the data repository are 
assigned to a domain. 

15 13. A method according to any of the preceding claims, 
wherein the terms in the predefined groups within the 
search database are synonymous terms . 

14. A method according to any of- fcifc: -riding, claims.,^: 
wherein each group has an associate cvot:.-? :-.<lentifier. ^> 
20 15. A method according to claim 13 c, ;£aim 14, wherein 
each group has associated descriptive data for describing 
the group. 

16. A method according to any of the preceding claims, 
further comprising determining the context of any 

25 repository records located. 

17 . A method according to claim 16 and when dependent upon 
claim 12, wherein the context is determined by limiting the 
search to repository records having a common domain. 

18. A method according to claim 16 or claim 17, wherein the 
30 context is determined by searching for the presence of one 

or more of the other terms within the group, in the same 
record of the repository. 

19. A method according to any of claims 16 to 18, wherein 
the context is determined by searching in related classes 

35 of terms. 
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2 0. A method according to any of claims 16 to 20, wherein 
the context is determined by the proximity of one or more 
related terms within a record. 

21. A computer program comprising computer program code 
5 means adapted to perform the method according to any of the 

preceding claims . 

22. A computer program according to claim 21, embodied 
upon a computer readable medium. 

23. A database searching system for searching a plurality 
10 of information databases for records related to. an inputted 

search term, the system comprising :- 

a search database comprising related search terms 
arranged into predefined groups according to their 
relationship to one another, wherein each term is present 
15 within one or more of the information databases; 

selection means, for selecting a group containing the 
inputted search term from the search database; 

a data repc-iv .r- prising selected data previously^ 
extracted from t~~,.:t.A of each information database;... 

20 and, 

searching means for searching the repository for terms 
from the selected group to identify the corresponding 
records within the information databases which contain the 
terms within the selected group. 
25 24. A system according to claim 23, wherein further 
comprising an input means for supplying the inputted search 
term to the selection means. 

25. A system according to claim 24, wherein the input 
means comprises a communication network such that the 

30 inputted search term is received from a remote location. 

26. A system according to any of claims claim 23 to 25, 
further comprising a plurality of information databases 
from which data is extracted for storage within the data 
repository. 

35 27. A system according to any of claims 23 to 26, wherein 
the data repository, is stored upon a separate computer 
system with respect to the information databases. 
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